Approximating Nearest Neighbor Distances

نویسندگان

  • Michael B. Cohen
  • Brittany Terese Fasy
  • Gary L. Miller
  • Amir Nayyeri
  • Don Sheehy
  • Ameya Velingker
چکیده

Several researchers proposed using non-Euclidean metrics on point sets in Euclidean space for clustering noisy data. Almost always, a distance function is desired that recognizes the closeness of the points in the same cluster, even if the Euclidean cluster diameter is large. Therefore, it is preferred to assign smaller costs to the paths that stay close to the input points. In this paper, we consider a natural metric with this property, which we call the nearest neighbor metric. Given a point set P and a path γ, this metric is the integral of the distance to P along γ. We describe a (3 + ε)approximation algorithm and a more intricate (1 + ε)-approximation algorithm to compute the nearest neighbor metric. Both approximation algorithms work in near-linear time. The former uses shortest paths on a sparse graph defined over the input points. The latter uses a sparse sample of the ambient space, to find good approximate geodesic paths.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

When Crossings Count — Approximating the Minimum

We present an (1+ε)-approximation algorithm for computing the minimum-spanning tree of points in a planar arrangement of lines, where the metric is the number of crossings between the spanning tree and the lines. The expected running time of the algorithm is near linear. We also show how to embed such a crossing metric of hyperplanes in d-dimensions, in subquadratic time, into high-dimensions s...

متن کامل

Exact and Approximate Reverse Nearest Neighbor Search for Multimedia Data

Reverse nearest neighbor queries are useful in identifying objects that are of significant influence or importance. Existing methods either rely on pre-computation of nearest neighbor distances, do not scale well with high dimensionality, or do not produce exact solutions. In this work we motivate and investigate the problem of reverse nearest neighbor search on high dimensional, multimedia dat...

متن کامل

Improved methods for the imputation of missing data by nearest neighbor methods

Missing data is an important issue in almost all fields of quantitative research. A nonparametric procedure that has been shown to be useful is the nearest neighbor imputation method. We suggest a weighted nearest neighbor imputation method based on Lq-distances. The weighted method is shown to have smaller imputation error than available NN estimates. In addition we consider weighted neighbor ...

متن کامل

Estimation of Density using Plotless Density Estimator Criteria in Arasbaran Forest

    Sampling methods have a theoretical basis and should be operational in different forests; therefore selecting an appropriate sampling method is effective for accurate estimation of forest characteristics. The purpose of this study was to estimate the stand density (number per hectare) in Arasbaran forest using a variety of the plotless density estimators of the nearest neighbors sampling me...

متن کامل

1 Nearest - neighbor distribution for singular billiards

The exact computation of the nearest-neighbor spacing distribution P (s) is performed for a rectangular billiard with point-like scat-terer inside for periodic and Dirichlet boundary conditions and it is demonstrated that when s → ∞ this function decreases exponentially. Together with the results of Ref. [13] it proves that spectral statistics of such systems is of intermediate type characteriz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015